In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
# Load pickled data
import pickle
# reading training and testing data
training_file = 'train.p'
testing_file = 'test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
# reading the signnames files
import pandas as pd
signnames_df = pd.read_csv("signnames.csv")
print(signnames_df.shape)
print(signnames_df)
The pickled data is a dictionary with 4 key/value pairs:
'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGESComplete the basic data summary below.
import numpy as np
# Number of training examples
n_train = X_train.shape[0]
# Number of testing examples.
n_test = X_test.shape[0]
# shape of a traffic sign image
image_shape = X_train[0].shape
# unique classes/labels in the dataset
n_classes = len(np.unique(train['labels']))
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.
The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.
NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.
### Data exploration and visualization
import matplotlib.pyplot as plt
# Visualizations will be shown in the notebook.
%matplotlib inline
import matplotlib.image as mpimg
# displays first image from every class
def show_images(images, labels):
"""Display the first image of each label."""
unique_labels = np.unique(labels)
print(unique_labels)
plt.figure(figsize=(15, 15))
i = 1
for label in unique_labels:
# Pick the first image for each label.
image = images[np.where(labels==label)][0]
plt.subplot(8, 8, i) # A grid of 8 rows x 8 columns
plt.axis('off')
i += 1
_ = plt.imshow(image)
plt.show()
show_images(X_train, y_train)
# displays the number of examples in test and training set in
# form of a bar chart
def train_test_distribution(X_train, y_train, X_test, y_test):
bar_width = 0.35
train_class, train_counts = np.unique(y_train, return_counts=True)
test_class, test_counts = np.unique(y_test, return_counts=True)
print(np.asarray((train_class, train_counts)))
print(np.asarray((test_class, test_counts)))
rects1 = plt.bar(train_class, train_counts, bar_width,
alpha=0.4,
color='b',
label='Train')
rects2 = plt.bar(test_class + bar_width, test_counts, bar_width,
alpha=0.4,
color='r',
label='Test')
plt.xlabel('Dataset')
plt.ylabel('Class counts')
plt.title('Train and test sets class counts')
plt.legend()
plt.show()
train_test_distribution(X_train, y_train, X_test, y_test)
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
import cv2
# converts the pixel values of an image set
# in the given range
def normalize_dataset(image_set):
normalized_set = list()
for image in image_set:
norm_img = np.zeros((32, 32))
norm_image = cv2.normalize(image,
norm_img,
alpha=-0.5,
beta=0.5,
norm_type=cv2.NORM_MINMAX,
dtype=cv2.CV_32F)
normalized_set.append(norm_image)
return np.asarray(normalized_set)
# equalizes the R, G, B components of every image
# for an image set on a individual basis
def equalize_hist(image_set):
equalized_set = list()
for image in image_set:
rgb = cv2.split(image)
rgb[0] = cv2.equalizeHist(rgb[0])
rgb[1] = cv2.equalizeHist(rgb[1])
rgb[2] = cv2.equalizeHist(rgb[2])
# merge the equalized image
combined_image = cv2.merge(rgb)
equalized_set.append(combined_image)
return np.asarray(equalized_set)
# equalizes brightness of images
X_train_bright = equalize_hist(X_train)
X_test_bright = equalize_hist(X_test)
# we can see the effect of brightness normalization below.
# all images are brighter and image no 12, 18, 19, 20, 31, 33, 41
# which could not be seen clearly ealier are now very clear.
show_images(X_train_bright, y_train)
# scaling the images to range -0.5 to 0.5
X_train_norm = normalize_dataset(X_train_bright)
X_test_norm = normalize_dataset(X_test_bright)
show_images(X_train_norm, y_train)
Describe how you preprocessed the data. Why did you choose that technique?
Answer:
All the R, G, B components of images are first equalized for their brightness. This is done because some of the images are very dull and their sign cannot be really seen. We can see the improvement in brightness in figure 3. Some images can be seen more clearly and we can make out which sign it is representing.
Next, images are noramlized in the range of -0.5 to 0.5 to make data well-conditioned.
Earlier I was considering using grayscale images but surely, images like traffic signs can make good use of color channels. If we use grayscale images then we are indeed losing some of the important information. It is one of those scenarios where we a tradeoff between speed (since, three channels will take longer to process) and accuracy. I experimented with both grayscale and colored images and for my network architecture, colored images worked better.
I also experimented with different scaling ranges -> 0.1 to 0.9; -1 to 1; 0 to 1 and -0.5 and 0.5. I got best results from -0.5 to 0.5.
# dividing data into training and validation.
# Since the classes are imbalanced, we are going to
# keep the same pattern in validation data as seen in
# training data.
from sklearn.cross_validation import train_test_split
from sklearn.utils import shuffle
# shuffle the dataset before splitting to train and
# validation set
X_train_norm, y_train_norm = shuffle(X_train_norm, y_train)
# validation set comprises of 30% of training set
# this set will have the same number of images per class
# as the training set
X_train_set, X_val_set, y_train_set, y_val_set = train_test_split(X_train_norm,
y_train_norm,
test_size=0.3,
stratify=y_train,
random_state=42)
print("Shape of training set: ", X_train_set.shape, y_train_set.shape)
print("Shape of validation set: ", X_val_set.shape, y_val_set.shape)
Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?
Answer:
I took off 30% of the training data to keep aside as validation set. This data is stratifeid so that the validation set has the same number of images per class as training set. This way we can eliminate the class-imbalance problem as posed in train-test-distribution figure.
Optional generating of additional data: I experimented with generating additional data by creating 4 images per image in the train set after the train set was divided into train and validation set. But this did not really give me a better accuracy. I would have tried to attack this area with another strategy where I create extra images for the classes which have lesser number of images as compared to others and try to bring images per class number to a constant.
On getting good results with non-augmented data, I decided not to use additional images because the network takes a lot of time to train in this case. Code for generating additional data is given below but not executed.
### Generate data additional data (THIS CODE IS NOT USED)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
# input: takes in an image, maximum rotation angle
# and maximum translation in x and y direction
def augment_image(image, max_rotate, max_translate):
# random transformation in terms of angle and
# translations are performed
(height, width) = image.shape[:2]
center = (width/2, height/2)
random_angle = np.random.uniform(-max_rotate, max_rotate)
# rotate image with random angles
matrix_random_angle = cv2.getRotationMatrix2D(center, random_angle, 1.0)
image = cv2.warpAffine(image, matrix_random_angle, (width, height))
# translate image with random x and y coordinates
x_translate = max_translate * width * np.random.uniform(-1, 1)
y_translate = max_translate * height * np.random.uniform(-1, 1)
matrix_random_translate = np.array([[1, 0, x_translate], [0, 1, y_translate]])
image = cv2.warpAffine(image, matrix_random_translate, (width, height))
return image
# creating 4 augmented images for all images in the train set
import cv2
new_augmented_images = list()
new_y = list()
print(X_train_norm.shape)
for image_index in range(len(X_train_norm)):
y = y_train[image_index]
for x in range(4):
new_image = augment_image(X_train_norm[image_index], 15, 0.2)
new_augmented_images.append(new_image)
new_y.append(y)
new_augmented_images = np.asarray(new_augmented_images)
print(new_augmented_images.shape)
new_y = np.asarray(new_y)
X_train_new = np.concatenate((X_train_norm, new_augmented_images), axis=0)
y_train_new = np.concatenate((y_train, new_y), axis=0)
print("Shape of training after augmentation: ", X_train_new.shape)
from tensorflow.contrib.layers import flatten
import tensorflow as tf
def TrafficNet(x):
mu = 0
sigma = 0.05
# a 1x1 filter for color map
conv0_w = tf.Variable(tf.truncated_normal(shape=(1, 1, 3, 3), mean=mu, stddev=sigma))
conv0_b = tf.Variable(tf.zeros(3))
conv0 = tf.nn.conv2d(x, conv0_w, strides=[1,1,1,1], padding='SAME') + conv0_b
conv0 = tf.nn.relu(conv0)
# conv1 input shape 32x32x3
# output shape 28x28x32
conv1_w = tf.Variable(tf.truncated_normal(shape=(5, 5, 3, 32), mean=mu, stddev=sigma))
conv1_b = tf.Variable(tf.zeros(32))
conv1 = tf.nn.conv2d(conv0,
conv1_w,
strides=[1, 1, 1, 1],
padding='VALID') + conv1_b
# activation
conv1 = tf.nn.relu(conv1)
# pooling input shape = 28x28x32
# output shape 14x14x32
conv1 = tf.nn.max_pool(conv1,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
conv1 = tf.nn.dropout(conv1, keep_prob)
# input shape 14x14x32
# output shape 10x10x64
conv2_w = tf.Variable(tf.truncated_normal(shape=(5, 5, 32, 64), mean=mu, stddev=sigma))
conv2_b = tf.Variable(tf.zeros(64))
conv2 = tf.nn.conv2d(conv1, conv2_w, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# activation
conv2 = tf.nn.relu(conv2)
# pooling
# output shape 10x10x64
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#Dropout Layer
conv2 = tf.nn.dropout(conv2, keep_prob)
# input shape 5x5x64
#output shape 4x4x128
conv3_w = tf.Variable(tf.truncated_normal(shape=(2, 2, 64, 128), mean=mu, stddev=sigma))
conv3_b = tf.Variable(tf.zeros(128))
conv3 = tf.nn.conv2d(conv2, conv3_w, strides=[1, 1, 1, 1], padding='VALID') + conv3_b
# activation
conv3 = tf.nn.relu(conv3)
# pooling
# output shape 4x4x128
conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1], padding='SAME')
#Dropout Layer
conv3 = tf.nn.dropout(conv3, keep_prob)
# flatten
conv3 = flatten(conv3)
# fully connected layer
# Input = 2048. Output = 1024.
fc1_W = tf.Variable(tf.truncated_normal(shape=(2048, 1024), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(1024))
fc1 = tf.matmul(conv3, fc1_W) + fc1_b
#Dropout Layer
fc1 = tf.nn.dropout(fc1, keep_prob)
# fully connected layer
# Input = 1024. Output = 120.
fc2_W = tf.Variable(tf.truncated_normal(shape=(1024, 120), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(120))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
#Dropout Layer
fc2 = tf.nn.dropout(fc2, keep_prob)
# fully connected layer
# Fully Connected. Input = 120. Output = 43.
fc3_W = tf.Variable(tf.truncated_normal(shape=(120, 43), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(43))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer:
Refer to figure below for model architecture. It is a convolutional neural network consisting of four sets of convolutional, activation, and pooling layers, followed by three fully-connected layers, activation, and finally a softmax classifier (image source).
The first layer is used so that the best color map can be identified by the model itself. It is a form of 1x1 convolution with depth of 3 and essentially, it acts as a layer which can find the best color map for our model.
Every layer is also followed by a dropout with a probability of 0.5. This helps in preventing overfitting.
Connectivity can be shown as
INPUT => CONV => RELU => CONV => RELU => POOL => DROPOUT => CONV => RELU => POOL => DROPOUT => CONV => RELU => POOL => DROPOUT => FC => DROPOUT => FC => DROPOUT => FC
Sizes can be seen from the figure as well as code. Strides and filter depths (k) are a result of parameter optimization.
image = cv2.cvtColor(cv2.imread("cnn_arc.jpeg", 1), cv2.COLOR_BGR2RGB)
plt.figure(figsize=(40, 60))
plt.imshow(image)
# tensorflow placeholders for input, labels and
# dropouts
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
# Placeholder for dropout keep probability
keep_prob = tf.placeholder(tf.float32)
one_hot_y = tf.one_hot(y, 43)
# learning rate
rate = 0.001
# softmax logits
logits = TrafficNet(x)
# finding cross entropy after passing logits through
# softmax
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
# defining loss for optimization by minimizing the
# mean cross entropy
loss_operation = tf.reduce_mean(cross_entropy)
# using Adam Optimizer for minimizing loss with the
# learning rate specified above
optimizer = tf.train.AdamOptimizer(learning_rate=rate)
training_operation = optimizer.minimize(loss_operation)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
BATCH_SIZE = 256
# evaluate function for finding accuracy of a batch and later
# calculating overall accuracy for a given epoch over
# validation set
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y:batch_y, keep_prob:1.0})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy/num_examples
# training the model
DROPOUT = 0.50
EPOCHS = 20
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train_set)
print("TRAINING..")
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train_set, y_train_set)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict={x:batch_x, y:batch_y, keep_prob:DROPOUT})
validation_accuracy = evaluate(X_val_set, y_val_set)
print("EPOCH {} ...".format(i+1))
print("VALIDATION ACCURACY = {:.3f}".format(validation_accuracy))
print()
print("Training completed")
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer:
Type of optimizer: Adam optimizer since it is more efficient in many cases. Did not experiment with other optimizers
Batch size: experimented with sizes of 128, 256, 512 and found that 256 works the best.
Epochs: 20 epochs. Experimented with sizes 30, 40 but did not have much positive impact.
Loss function: minizing the mean cross entropy for optimaztion.
Nonlinearity: RELU
Weights: setting initial weights as a random sample from a normal distribution of 0 zero mean and standard deviation 0.05 gave best results. Experimented with stddev=1 and a constant value of 0.
What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.
Answer:
I did not start with LeNet architecture but with incrementally adding one layer at a time to this multi-layer model. Initially it was a single layer CNN with one fully connected layer. After adding another layer, I was getting an accuracy of 88% on validation set.
After seeing this result, I knew how would a basic model would perform so then I went ahead and implemented the LeNet model. My model is also inspired by Vivek Yadav's post on Medium about his architecture but is not a copy of it. It is more of a mixture of both LeNet and his models.
Experiments showed that adding the first layer for color scheme surely adds a bit to the accuracy.
Since, LeNet was for grayscale images, I wanted to make my model wider by adding more filters than the LeNet model. I think this really helped because we are capturing more depth of the image given that these traffic signs are way more complex than LeNet images.
Strides and filter sizes are mostly inspired by LeNet. Since, LeNet has only two layers, the strides in third layer in my model are a result of hit and trial.
I experimented on the model with and without dropouts and using dropouts was a clean winner. I used dropouts of 0.25, 0.75 and 0.50 and best results were obtained by using 0.5.
I went from adding one fully connected layer, to two and finally to three and the accuracy certainly increased.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
# combining train and validation set to be run
# on the same architecture
# since the hyper-parameters and model architecture
# are finalized, we can utilize the remaining 30%
# of the dataset to train on
DROPOUT = 0.50
EPOCHS = 20
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train_norm)
print("TRAINING..")
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train_norm, y_train_norm)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict={x:batch_x, y:batch_y, keep_prob:DROPOUT})
print("EPOCH {} ...".format(i+1))
print()
saver.save(sess, 'trafficNet')
print("Model saved")
# first of all we will check how did the model perform
# on test set to guage its performance
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
test_accuracy = evaluate(X_test_norm, y_test)
print("Test Accuracy = {:.3f}".format(test_accuracy))
# following are the 21 images taken from web
# to test our model on
# few of these images are the ones on which the
# model is not trained. Their actual labels are marked as
# -1 and predicted values are marked as NA
image_list = ['donot.jpeg', 'stops.jpeg', 'yield.jpeg', 'ahead.jpeg', 'round.jpeg', '120.jpeg', 'slip.jpeg',
'road.jpeg', '20.jpg', '30.jpeg', 'sl50.jpeg', 'ped.jpeg', 'pred_tri.jpeg', 'yellow_pred.jpeg',
'children_crossing.jpg', 'bicycle.jpg', 'ele.jpeg', 'max40.jpeg', 'noleft.jpeg', 'norightt.jpeg',
'wr.jpeg']
reshaped_images = list()
# reshaping image to 32x32 as needed by the network
for image in image_list:
image = cv2.cvtColor(cv2.imread(image, 1), cv2.COLOR_BGR2RGB)
resized_image = cv2.resize(image, (32, 32))
reshaped_images.append(resized_image)
# preprocess these images just as training images were
X_new_test_bright = equalize_hist(reshaped_images)
X_new_test_norm = normalize_dataset(X_new_test_bright)
# defining predicition function which gives the
# predicted label
prediction_operation = tf.argmax(logits, 1)
# actual labels of new test images
# -1 shows these images were not present in training
expected_predictions = [17, 14, 13, 35, 40, 8, 23, 25, 0, 1, 2, 27, 27, 27, 28, 29, -1, -1, -1, -1, -1, -1]
# calculate predictions in a current session
def get_preds(X_data):
sess = tf.get_default_session()
prediction = sess.run(prediction_operation, feed_dict={x: X_data, keep_prob:1.0})
return prediction
# seeing results on new test set
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
preds = get_preds(X_new_test_norm)
print("New predictions:", preds)
print("Expected Predictions: ", expected_predictions)
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.
Answer:
First of all, lets see the overall performance of our network on the new test images.
Specifically, I would like to talk about images which did not get predicted properly.
1. 120 km/hr - not very obvious. Maybe due to the curve on top of 2 combined with 1 in the beginning seems like an 8
2. Slippery road - does resemble animal crossing on some level. Number of examples are less as compared to others so model might not have been able to learn it properly.
3. Speed limit 20 - very few examples might have led to misclassification.
4. Speed limit 30 - confusion may arise due to similarities in digits 3 and 8.
5. Speed limit 50 - due to presence of shadow, the 5 does look like an 8.
6. Pedestrains - I used three different images for this class but in every case the classifier failed. This is probably because number of examples is just 240 for this class which is not sufficient for the model. Also, shape of the sign board matters a lot.
7. Children crossing - can't think of why it is predicted this way
8. Bicycle crossing - might resemble ice/snow on some level
# Display the predictions and the ground truth visually.
fig = plt.figure(figsize=(10, 10))
for i in range(len(reshaped_images)):
truth = expected_predictions[i]
prediction = preds[i]
plt.subplot(11, 2,1+i)
plt.axis('off')
color='green' if truth == prediction else 'red'
if truth==-1:
truth = 'NA'
else:
truth = signnames_df[signnames_df['ClassId']==truth].values[0][1]
prediction = signnames_df[signnames_df['ClassId']==prediction].values[0][1]
plt.text(40, 10, "Truth: {0}\nPrediction: {1}".format(truth, prediction),
fontsize=12, color=color)
plt.imshow(reshaped_images[i])
Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.
NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.
Answer: My model does not perform well on the new images. If I leave the images on which the model was not trained on (marked as NA), I get 6 out of 16 images right which means accuracy is 37.5%.
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.
Take this numpy array as an example:
# (5, 6) array
a = np.array([[ 0.24879643, 0.07032244, 0.12641572, 0.34763842, 0.07893497,
0.12789202],
[ 0.28086119, 0.27569815, 0.08594638, 0.0178669 , 0.18063401,
0.15899337],
[ 0.26076848, 0.23664738, 0.08020603, 0.07001922, 0.1134371 ,
0.23892179],
[ 0.11943333, 0.29198961, 0.02605103, 0.26234032, 0.1351348 ,
0.16505091],
[ 0.09561176, 0.34396535, 0.0643941 , 0.16240774, 0.24206137,
0.09155967]])
Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:
TopKV2(values=array([[ 0.34763842, 0.24879643, 0.12789202],
[ 0.28086119, 0.27569815, 0.18063401],
[ 0.26076848, 0.23892179, 0.23664738],
[ 0.29198961, 0.26234032, 0.16505091],
[ 0.34396535, 0.24206137, 0.16240774]]), indices=array([[3, 0, 5],
[0, 1, 4],
[0, 5, 1],
[1, 3, 5],
[1, 4, 3]], dtype=int32))
Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.
with tf.Session() as sess:
sess = tf.get_default_session()
saver.restore(sess, tf.train.latest_checkpoint('.'))
logits_ph = tf.placeholder('float', [None, 43])
softmax = tf.nn.softmax(logits_ph)
# obtain logits for new images calculated by the model
logits_received = sess.run(logits, feed_dict={x: X_new_test_norm, keep_prob: 1.})
# operation for top k values and indices
top_5_values, top_5_indices = tf.nn.top_k(softmax, k=5)
# find the top 5 values and their indices based on the logits received
top_5_vals, top_5_ids = sess.run([top_5_values, top_5_indices], feed_dict={logits_ph: logits_received})
def pred_certainty_str(top_5_val, top_5_indices):
# Convert top k indices into strings
top_5_pred = [signnames_df[signnames_df['ClassId']==index].values[0][1] for index in top_5_indices]
predictions = ''
for i in range(5):
predictions += '%s: %.2f%%\n' % (top_5_pred[i].replace('\n', ''), top_5_val[i] * 100)
return predictions
# correct prediction, very sure
plt.imshow(reshaped_images[0])
print(pred_certainty_str(top_5_vals[0], top_5_ids[0]))
# correct prediction, highly sure
plt.imshow(reshaped_images[1])
print(pred_certainty_str(top_5_vals[1], top_5_ids[1]))
# correct prediction
# completely sure
plt.imshow(reshaped_images[2])
print(pred_certainty_str(top_5_vals[2], top_5_ids[2]))
# correct prediction
# completely sure
plt.imshow(reshaped_images[3])
print(pred_certainty_str(top_5_vals[3], top_5_ids[12]))
# correct prediction
# almost completely sure
plt.imshow(reshaped_images[4])
print(pred_certainty_str(top_5_vals[4], top_5_ids[4]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[5])
print(pred_certainty_str(top_5_vals[5], top_5_ids[5]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[6])
print(pred_certainty_str(top_5_vals[6], top_5_ids[6]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[8])
print(pred_certainty_str(top_5_vals[8], top_5_ids[8]))
# incorrect prediction
# correct prediction not in top 5
plt.imshow(reshaped_images[9])
print(pred_certainty_str(top_5_vals[9], top_5_ids[9]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[10])
print(pred_certainty_str(top_5_vals[10], top_5_ids[10]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[11])
print(pred_certainty_str(top_5_vals[11], top_5_ids[11]))
# incorrect prediction
# correct prediction in top 5
plt.imshow(reshaped_images[12])
print(pred_certainty_str(top_5_vals[12], top_5_ids[12]))
# incorrect prediction
# correct prediction not in top 5
plt.imshow(reshaped_images[13])
print(pred_certainty_str(top_5_vals[13], top_5_ids[13]))
# incorrect prediction
# correct prediction not in top 5
plt.imshow(reshaped_images[14])
print(pred_certainty_str(top_5_vals[14], top_5_ids[14]))
# incorrect prediction
# correct prediction not in top 5
plt.imshow(reshaped_images[15])
print(pred_certainty_str(top_5_vals[15], top_5_ids[15]))
Answer:
Out of 10 images that were predicted incorrectly, 6 have the correct label in top 5 predictions.
I believe, classes like Pedestrian crossing, children crossing, beware of ice/snow can really do well with more samples.
The right approach would be to create more samples by augmentation for these classes and try to bring samples per class value to a constant.
I ran the same model (except for the first layer) on grayscale images and found that the results were more or less the same. Just that running this model on grayscale images is faster.
Same model was also run on augmented images where more images were created in such a way that every class has 2250 samples in it. Due to lack of time, I could not create a different model for this data and running this data on the same model gave 93.4% accuracy on test set with a dropout of 0.75 even though the validation accuracy reached upto 99%. This model is overfitting but I could not find time to dig into it. It is an interesting aspect and I would like to get deeper into this later.
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.